This data contains the top 1000 movies by their user scores on IMDb. It’s available here on Kaggle.
Some questions for analysis after browsing the data:
What is the disparity between user and critic scores? What’s the most controversial film between users and critics?
How do top grossing movies score? Is there a clear correlation between score and profit?
For each actor and director, what’s the total gross profit of all the movies they’ve participated in?
Which genres are most prominent among the top movies?
Which genre of movie do each actor and director prefer?
library(tidyverse)
library(knitr)
library(kableExtra)
library(plotly)
library(forcats)
library(png)
library(ggimage)
library(grid)
imdb_df <- read.csv("/Users/taylorparchment/Downloads/imdb_top_1000.csv")
glimpse(imdb_df)
## Rows: 1,000
## Columns: 16
## $ Poster_Link <chr> "https://m.media-amazon.com/images/M/MV5BMDFkYTc0MGEtZmN…
## $ Series_Title <chr> "The Shawshank Redemption", "The Godfather", "The Dark K…
## $ Released_Year <chr> "1994", "1972", "2008", "1974", "1957", "2003", "1994", …
## $ Certificate <chr> "A", "A", "UA", "A", "U", "U", "A", "A", "UA", "A", "U",…
## $ Runtime <chr> "142 min", "175 min", "152 min", "202 min", "96 min", "2…
## $ Genre <chr> "Drama", "Crime, Drama", "Action, Crime, Drama", "Crime,…
## $ IMDB_Rating <dbl> 9.3, 9.2, 9.0, 9.0, 9.0, 8.9, 8.9, 8.9, 8.8, 8.8, 8.8, 8…
## $ Overview <chr> "Two imprisoned men bond over a number of years, finding…
## $ Meta_score <int> 80, 100, 84, 90, 96, 94, 94, 94, 74, 66, 92, 82, 90, 87,…
## $ Director <chr> "Frank Darabont", "Francis Ford Coppola", "Christopher N…
## $ Star1 <chr> "Tim Robbins", "Marlon Brando", "Christian Bale", "Al Pa…
## $ Star2 <chr> "Morgan Freeman", "Al Pacino", "Heath Ledger", "Robert D…
## $ Star3 <chr> "Bob Gunton", "James Caan", "Aaron Eckhart", "Robert Duv…
## $ Star4 <chr> "William Sadler", "Diane Keaton", "Michael Caine", "Dian…
## $ No_of_Votes <int> 2343110, 1620367, 2303232, 1129952, 689845, 1642758, 182…
## $ Gross <chr> "28,341,469", "134,966,411", "534,858,444", "57,300,000"…
colSums(is.na(imdb_df))
## Poster_Link Series_Title Released_Year Certificate Runtime
## 0 0 0 0 0
## Genre IMDB_Rating Overview Meta_score Director
## 0 0 0 157 0
## Star1 Star2 Star3 Star4 No_of_Votes
## 0 0 0 0 0
## Gross
## 0
The “Gross” category of this data is a character vector, which will need to be converted to an integer. There are also some blank strings which aren’t getting detected as null values.
# Convert numbers from strings to integers
imdb_df <- imdb_df %>%
mutate(Gross = parse_number(Gross))
# Get Gross column's null values
sum(is.na(imdb_df$Gross))
## [1] 169
Additionally, I later noticed that “Joe Russo” is listed as “Star1” several times, but it appears he was an additional director in those movies, and not an actor. For simplicity’s sake, I’ll just delete these cells so he doesn’t appear under any charts as an actor.
imdb_df$Star1[imdb_df$Star1 == "Joe Russo"] <- NA
# Top movies by user score
top_user <- imdb_df %>%
arrange(desc(IMDB_Rating)) %>%
select(Series_Title, Released_Year, IMDB_Rating, Meta_score, Gross)
# Top movies by metascore
top_meta <- imdb_df %>%
arrange(desc(Meta_score)) %>%
select(Series_Title, Released_Year, Meta_score, IMDB_Rating, Gross)
knitr::kable(head(top_user, 20))
| Series_Title | Released_Year | IMDB_Rating | Meta_score | Gross |
|---|---|---|---|---|
| The Shawshank Redemption | 1994 | 9.3 | 80 | 28341469 |
| The Godfather | 1972 | 9.2 | 100 | 134966411 |
| The Dark Knight | 2008 | 9.0 | 84 | 534858444 |
| The Godfather: Part II | 1974 | 9.0 | 90 | 57300000 |
| 12 Angry Men | 1957 | 9.0 | 96 | 4360000 |
| The Lord of the Rings: The Return of the King | 2003 | 8.9 | 94 | 377845905 |
| Pulp Fiction | 1994 | 8.9 | 94 | 107928762 |
| Schindler’s List | 1993 | 8.9 | 94 | 96898818 |
| Inception | 2010 | 8.8 | 74 | 292576195 |
| Fight Club | 1999 | 8.8 | 66 | 37030102 |
| The Lord of the Rings: The Fellowship of the Ring | 2001 | 8.8 | 92 | 315544750 |
| Forrest Gump | 1994 | 8.8 | 82 | 330252182 |
| Il buono, il brutto, il cattivo | 1966 | 8.8 | 90 | 6100000 |
| The Lord of the Rings: The Two Towers | 2002 | 8.7 | 87 | 342551365 |
| The Matrix | 1999 | 8.7 | 73 | 171479930 |
| Goodfellas | 1990 | 8.7 | 90 | 46836394 |
| Star Wars: Episode V - The Empire Strikes Back | 1980 | 8.7 | 82 | 290475067 |
| One Flew Over the Cuckoo’s Nest | 1975 | 8.7 | 83 | 112000000 |
| Hamilton | 2020 | 8.6 | 90 | NA |
| Gisaengchung | 2019 | 8.6 | 96 | 53367844 |
knitr::kable(head(top_meta, 20))
| Series_Title | Released_Year | Meta_score | IMDB_Rating | Gross |
|---|---|---|---|---|
| The Godfather | 1972 | 100 | 9.2 | 134966411 |
| Casablanca | 1942 | 100 | 8.5 | 1024560 |
| Rear Window | 1954 | 100 | 8.4 | 36764313 |
| Lawrence of Arabia | 1962 | 100 | 8.3 | 44824144 |
| Vertigo | 1958 | 100 | 8.3 | 3200000 |
| Citizen Kane | 1941 | 100 | 8.3 | 1585634 |
| Trois couleurs: Rouge | 1994 | 100 | 8.1 | 4043686 |
| Fanny och Alexander | 1982 | 100 | 8.1 | 4971340 |
| Il conformista | 1970 | 100 | 8.0 | 541940 |
| Sweet Smell of Success | 1957 | 100 | 8.0 | NA |
| Boyhood | 2014 | 100 | 7.9 | 25379975 |
| Notorious | 1946 | 100 | 7.9 | 10464000 |
| City Lights | 1931 | 99 | 8.5 | 19181 |
| Singin’ in the Rain | 1952 | 99 | 8.3 | 8819028 |
| Touch of Evil | 1958 | 99 | 8.0 | 2237659 |
| The Night of the Hunter | 1955 | 99 | 8.0 | 654000 |
| Shichinin no samurai | 1954 | 98 | 8.6 | 269061 |
| North by Northwest | 1959 | 98 | 8.3 | 13275000 |
| Metropolis | 1927 | 98 | 8.3 | 1236166 |
| Pan’s Labyrinth | 2006 | 98 | 8.2 | 37634615 |
Interestingly, browsing through the top 20 movies by each score, it appears the movies valued by average viewers and critics is quite different. Especially looking at the top metascore movies, we can see some with quite different user scores. I want to know what movie has the biggest difference between the groups.
score_disparity <- imdb_df %>%
mutate(score_diff = abs(IMDB_Rating * 10 - Meta_score)) %>%
select(Series_Title, Released_Year, score_diff, IMDB_Rating, Meta_score,Genre) %>%
arrange(desc(score_diff)) %>%
head(10)
knitr::kable(score_disparity)
| Series_Title | Released_Year | score_diff | IMDB_Rating | Meta_score | Genre |
|---|---|---|---|---|---|
| I Am Sam | 2001 | 49 | 7.7 | 28 | Drama |
| Tropa de Elite | 2007 | 47 | 8.0 | 33 | Action, Crime, Drama |
| The Butterfly Effect | 2004 | 46 | 7.6 | 30 | Drama, Sci-Fi, Thriller |
| Seven Pounds | 2008 | 40 | 7.6 | 36 | Drama |
| Kai po che! | 2013 | 37 | 7.7 | 40 | Drama, Sport |
| Fear and Loathing in Las Vegas | 1998 | 35 | 7.6 | 41 | Adventure, Comedy, Drama |
| Pink Floyd: The Wall | 1982 | 34 | 8.1 | 47 | Drama, Fantasy, Music |
| The Boondock Saints | 1999 | 34 | 7.8 | 44 | Action, Crime, Thriller |
| Bound by Honor | 1993 | 33 | 8.0 | 47 | Crime, Drama |
| Predator | 1987 | 33 | 7.8 | 45 | Action, Adventure, Sci-Fi |
The movie with the biggest difference between user and critical score is the 2001 drama I Am Sam. All of the movies here were liked by users and disliked by critics, and the majority of the movies have drama listed as at least one genre.
This lets us know what movies users liked and not critics, but I’d like to know the other direction too.
# Look only for movies critics liked
score_disparity2 <- imdb_df %>%
mutate(score_diff = Meta_score - IMDB_Rating * 10) %>%
select(Series_Title, Released_Year, score_diff, IMDB_Rating, Meta_score,Genre) %>%
arrange(desc(score_diff)) %>%
head(10)
knitr::kable(score_disparity2)
| Series_Title | Released_Year | score_diff | IMDB_Rating | Meta_score | Genre |
|---|---|---|---|---|---|
| Boyhood | 2014 | 21 | 7.9 | 100 | Drama |
| Notorious | 1946 | 21 | 7.9 | 100 | Drama, Film-Noir, Romance |
| Il conformista | 1970 | 20 | 8.0 | 100 | Drama |
| Sweet Smell of Success | 1957 | 20 | 8.0 | 100 | Drama, Film-Noir |
| The Lady Vanishes | 1938 | 20 | 7.8 | 98 | Mystery, Thriller |
| A Hard Day’s Night | 1964 | 20 | 7.6 | 96 | Comedy, Music, Musical |
| Trois couleurs: Rouge | 1994 | 19 | 8.1 | 100 | Drama, Mystery, Romance |
| Fanny och Alexander | 1982 | 19 | 8.1 | 100 | Drama |
| Touch of Evil | 1958 | 19 | 8.0 | 99 | Crime, Drama, Film-Noir |
| The Night of the Hunter | 1955 | 19 | 8.0 | 99 | Crime, Drama, Film-Noir |
It looks like the score differences here are much smaller. It’s probably also due to the fact that this data was selected from user scores, so they are guaranteed not to be very low, whereas critic scores could be any value.
I’ll get an idea of the highest-earning movies.
# Find the top 100 top grossing movies
top_grossing <- imdb_df %>%
arrange(desc(Gross)) %>%
select(Series_Title, Released_Year, IMDB_Rating, Meta_score, Gross) %>%
head(100)
knitr ::kable(head(top_grossing, 10))
| Series_Title | Released_Year | IMDB_Rating | Meta_score | Gross |
|---|---|---|---|---|
| Star Wars: Episode VII - The Force Awakens | 2015 | 7.9 | 80 | 936662225 |
| Avengers: Endgame | 2019 | 8.4 | 78 | 858373000 |
| Avatar | 2009 | 7.8 | 83 | 760507625 |
| Avengers: Infinity War | 2018 | 8.4 | 68 | 678815482 |
| Titanic | 1997 | 7.8 | 75 | 659325379 |
| The Avengers | 2012 | 8.0 | 69 | 623279547 |
| Incredibles 2 | 2018 | 7.6 | 80 | 608581744 |
| The Dark Knight | 2008 | 9.0 | 84 | 534858444 |
| Rogue One | 2016 | 7.8 | 65 | 532177324 |
| The Dark Knight Rises | 2012 | 8.4 | 78 | 448139099 |
First, I want to see if how much profit a movie makes is indicative of how well average viewers will like it.
# Top grossing movies vs user scores
# Graph a scatter plot and line of best fit
gross_vs_user <- ggplot(top_grossing, aes(x = Gross, y = IMDB_Rating, text = Series_Title)) +
geom_point() +
geom_smooth(aes(group=-1), method="lm", se = FALSE) +
scale_x_continuous(limits = c(0, 1e9), breaks = seq(0, 1e9, by = 100000000), labels = c("0", "100M", "200M", "300M", "400M", "500M", "600M", "700M", "800M", "900M", "1B")) +
scale_y_continuous(limits = c(7.5, 9.2)) +
labs(title = "Top Grossing Movies vs. User Score",
y = "User Score",
x = "Gross Profit in Dollars") + theme_minimal() +
coord_cartesian(xlim = c(0, 1e9), ylim = c(7.5, 9.2))
ggplotly(gross_vs_user)
I’ll try the same thing with metascore.
gross_vs_meta <- ggplot(top_grossing, aes(x = Gross, y = Meta_score, text = Series_Title)) +
geom_point() +
geom_smooth(aes(group=-1), method="lm", se = FALSE) +
scale_x_continuous(limits = c(0, 1e9), breaks = seq(0, 1e9, by = 100000000), labels = c("0", "100M", "200M", "300M", "400M", "500M", "600M", "700M", "800M", "900M", "1B")) +
scale_y_continuous(limits = c(50, 100)) +
labs(title = "Top Grossing Movies vs. Metascore",
y = "Metascore",
x = "Gross Profit in Dollars") + theme_minimal()
ggplotly(gross_vs_meta)
While the scores individual top grossing movies received are different between average and critical raters, their spreads are similar, with metascore spread a bit larger. Both show very weak relationships if any to gross profit.
It seems like the reverse of this relationship would be stronger. I would expect user or critical scores to be more indicative of profit.
user_vs_gross <- ggplot(head(top_user, 100), aes(x = IMDB_Rating, y = Gross, text = Series_Title)) +
geom_point() +
geom_smooth(aes(group=-1), method="lm", se = FALSE) +
labs(title = "Top 100 IMDb Movies by User Score vs. Gross Profit",
y = "Gross Profit in Dollars",
x = "User Score") +
scale_y_continuous(breaks = seq(0, 1e9, by = 100000000), labels = c("0", "100M", "200M", "300M", "400M", "500M", "600M", "700M", "800M", "900M", "1B")) + theme_minimal()
ggplotly(user_vs_gross)
The relationship between them is still less obvious than I would have expected. Let’s look into the relationship a bit.
summary(lm(formula = Gross ~ IMDB_Rating, data = top_user))
##
## Call:
## lm(formula = Gross ~ IMDB_Rating, data = top_user)
##
## Residuals:
## Min 1Q Median 3Q Max
## -102820436 -62517252 -41910193 17363997 870372054
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -227376280 106535840 -2.134 0.03311 *
## IMDB_Rating 37172968 13397415 2.775 0.00565 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 109300000 on 829 degrees of freedom
## (169 observations deleted due to missingness)
## Multiple R-squared: 0.009201, Adjusted R-squared: 0.008006
## F-statistic: 7.699 on 1 and 829 DF, p-value: 0.005651
With a p-value of 0.005651, we can assume it’s statistically significant.
meta_vs_gross <- ggplot(head(top_meta, 100), aes(x = Meta_score, y = Gross, text = Series_Title)) +
geom_point() +
geom_smooth(aes(group=-1), method="lm", se = FALSE) +
labs(title = "Top 100 IMDb Movies by Metascore vs. Gross Profit",
y = "Gross Profit in Dollars",
x = "Metascore") +
scale_y_continuous(breaks = seq(0, 1e9, by = 100000000), labels = c("0", "100M", "200M", "300M", "400M", "500M", "600M", "700M", "800M", "900M", "1B")) + theme_minimal()
ggplotly(meta_vs_gross)
The connection between metascore and profit looks weaker. It seems ridiculous that a higher metascore would reduce profits. I’ll check this too.
summary(lm(formula = Gross ~ Meta_score, data = top_meta))
##
## Call:
## lm(formula = Gross ~ Meta_score, data = top_meta)
##
## Residuals:
## Min 1Q Median 3Q Max
## -87279147 -68401159 -43159626 23025583 862414862
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 96442843 26009397 3.708 0.000224 ***
## Meta_score -277444 331500 -0.837 0.402897
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 113400000 on 748 degrees of freedom
## (250 observations deleted due to missingness)
## Multiple R-squared: 0.0009356, Adjusted R-squared: -0.0004001
## F-statistic: 0.7005 on 1 and 748 DF, p-value: 0.4029
This relationship is not significant.
Of course, thinking about this sensibly, it’s reasonable that when the average viewer likes a movie, that would drive profit much more than good critical response only. This is also very likely due to the bias of the dataset – this is a set of top 1000 user rated movies, so this data is heavily skewed towards them. A dataset of highest scoring critical reviews could be more informative of its relationship to profit.
Next I’ll take a look at some of the actors and directors who have been a part of the highest grossing movies total.
top_grossing_actors <- imdb_df %>%
pivot_longer(cols = starts_with("Star"), values_to = "Actor") %>% # Get each actor in one row
filter(!is.na(Actor)) %>%
group_by(Actor) %>%
summarize(Total_Gross = sum(Gross, na.rm = TRUE), Gross_Per_Movie = mean(Gross, na.rm = TRUE)) %>%
arrange(desc(Total_Gross))
top_grossing_directors <- imdb_df %>%
pivot_longer(cols = Director, values_to = "Director") %>%
group_by(Director) %>%
summarize(Total_Gross = sum(Gross, na.rm = TRUE), Gross_Per_Movie = mean(Gross, na.rm = TRUE)) %>%
arrange(desc(Total_Gross))
# Highest gross actor chart
ggplot(head(top_grossing_actors, 10), aes(x = fct_reorder(Actor, Total_Gross), y = Total_Gross)) +
geom_col() +
labs(title = "Total Gross Profit of Actors' Movies",
x = "Actor",
y = "Total Gross Movie Profit") +
scale_y_continuous(breaks = seq(0, 3e9, by = 500000000), labels = c("0", "500M", "1B", "1.5B", "2B", "2.5B", "3B")) + coord_flip() + theme_minimal()
# Highest gross director chart
ggplot(head(top_grossing_directors, 10), aes(x = fct_reorder(Director, Total_Gross), y = Total_Gross)) +
geom_col() +
labs(title = "Total Gross Profit of Directors' Movies",
x = "Director",
y = "Total Gross Movie Profit") +
scale_y_continuous(breaks = seq(0, 3e9, by = 500000000), labels = c("0", "500M", "1B", "1.5B", "2B", "2.5B", "3B")) + coord_flip() + theme_minimal()
Next, I want to see which movie genre is the most represented in the top thousand movies.
# Make data longer by separating Genre column, count totals
total_genre_count <- imdb_df %>%
separate_rows(Genre, sep = ",\\s*") %>%
group_by(Genre) %>%
summarize(count = n()) %>%
arrange(desc(count))
ggplot(total_genre_count, aes(x = fct_reorder(Genre, count), y = count)) +
geom_col() +
coord_flip() +
theme_minimal() +
labs(title = "Total Genre Count",
x = "Genre",
y = "Total Number of Movies")
I’m also curious what some of the most prolific actors’ and directors’ favorite genres are. I’ll see which genres they are listed the most in.
# Get each actors' total number of times acted in each genre
# Elongate data by genre and actor, group by actor and each genre
actor_genre_count <- imdb_df %>%
separate_rows(Genre, sep = ",\\s*") %>%
pivot_longer(cols = starts_with("Star"), names_to = NULL, values_to = "Actor") %>%
group_by(Actor, Genre) %>%
summarize(count = n()) %>%
arrange(desc(count))
# Get the director's total genres too
director_genre_count <- imdb_df %>%
separate_rows(Genre, sep = ",\\s*") %>%
group_by(Director, Genre) %>%
summarize(count = n()) %>%
arrange(desc(count))
knitr::kable(head(actor_genre_count, 20))
| Actor | Genre | count |
|---|---|---|
| Robert De Niro | Drama | 17 |
| Al Pacino | Drama | 13 |
| Robert De Niro | Crime | 12 |
| Al Pacino | Crime | 11 |
| Brad Pitt | Drama | 9 |
| Christian Bale | Drama | 9 |
| Denzel Washington | Drama | 9 |
| Ethan Hawke | Drama | 9 |
| Leonardo DiCaprio | Drama | 9 |
| Tom Hanks | Drama | 9 |
| Harrison Ford | Action | 8 |
| Johnny Depp | Drama | 8 |
| Aamir Khan | Drama | 7 |
| Ian McKellen | Adventure | 7 |
| Jake Gyllenhaal | Drama | 7 |
| James Stewart | Drama | 7 |
| Morgan Freeman | Drama | 7 |
| Russell Crowe | Drama | 7 |
| Tom Hanks | Adventure | 7 |
| Bill Murray | Comedy | 6 |
knitr::kable(head(director_genre_count, 20))
| Director | Genre | count |
|---|---|---|
| Hayao Miyazaki | Animation | 11 |
| Akira Kurosawa | Drama | 9 |
| Alfred Hitchcock | Mystery | 9 |
| Alfred Hitchcock | Thriller | 9 |
| Hayao Miyazaki | Adventure | 9 |
| Martin Scorsese | Drama | 9 |
| Billy Wilder | Drama | 8 |
| David Fincher | Drama | 8 |
| Martin Scorsese | Crime | 8 |
| Woody Allen | Comedy | 8 |
| Clint Eastwood | Drama | 7 |
| Ingmar Bergman | Drama | 7 |
| Quentin Tarantino | Drama | 7 |
| Stanley Kubrick | Drama | 7 |
| Steven Spielberg | Drama | 7 |
| Charles Chaplin | Comedy | 6 |
| Wes Anderson | Comedy | 6 |
| Alfonso Cuarón | Drama | 5 |
| Alfred Hitchcock | Drama | 5 |
| Andrei Tarkovsky | Drama | 5 |
I think most people know that audiences and critics tend to value different movies, but it’s interesting to see that the top audience and critic favorites are almost entirely different, and some movies are especially divisive. User-liked dramas are often subject to lower critical scores, while old, film-noir movies are extremely highly rated by critics, but were only received normally by the general audience.
When it comes to profit and scores, profit doesn’t tell us much about how viewers might have rated a movie, but higher general audience scores do seem indicative of higher gross profit. This is not true, however, of critic scores.
Among the actors who’ve been in the most top-grossing movies, we see a lot of actors from the Marvel franchise (Robert Downey Jr, Chris Evans, Mark Ruffalo) and the Harry Potter series (Daniel Radcliffe, Rupert Gint).
As for genres, drama is by far the most commonly listed. Of course, movies can be listed under several genres, and it’s pretty hard to have a movie without some kind of drama, so it might not be very informative. Looking at actors and directors, some that stand out are Robert De Nero for his number of drama and crime listings, and Hayao Miyazaki for his 11 animations.
Finally, I’ll make a nice visual of the two most controversial movies of the list.
boyhood <- "/Users/taylorparchment/Desktop/Boyhood_(2014).png"
iamsam <- "/Users/taylorparchment/Desktop/IAmSam.png"
# filter two controversial titles
# get their two score types on different rows
controversial <- imdb_df %>%
filter(Series_Title == "Boyhood" | Series_Title == "I Am Sam") %>%
mutate(Viewers = IMDB_Rating * 10,
Critics = Meta_score) %>%
pivot_longer(cols = c(Viewers, Critics), names_to = "Score_Type", values_to = "Score")
movie_bar_chart <- ggplot(controversial, aes(x = Series_Title, y = Score, fill = Score_Type)) +
# side by side bars
geom_col(position = position_dodge()) +
#change visual details
theme_transparent() +
coord_cartesian(ylim = c(0, max(controversial$Score) + 60)) +
scale_y_continuous(breaks=c(25, 50, 75, 100)) +
# display and format labels
labs(title = "Viewers vs Critics",
subtitle = "Most controversial movies between viewers and critics of IMDb's top 1000",
fill = "") +
theme(legend.position = "bottom",
plot.title = element_text(face = "bold",
size = 17, hjust = 0.5)) +
# add score value labels on the bars
geom_text(
aes(label = Score),
color = "white", size = 3,
vjust = 2, position = position_dodge(.9), fontface ="bold") +
# add Boyhood text
geom_label(aes(x= 0.41, y = 150), label = "2015 experimental film \nfollowing the real \nadolescence of boy's life", size = 3, hjust = 0, vjust = 0.5, show.legend = FALSE) +
geom_label(aes(x= 0.41, y = 134), label = "Won Oscar for best\nsupporting performance", size = 3, hjust = 0, vjust = 0.5, show.legend = FALSE) +
geom_label(aes(x= 0.41, y = 118), label = 'Hailed as "epic in scope",\n"astonishing achievement"\nby critics', size = 3, hjust = 0, vjust = 0.5, show.legend = FALSE) +
# add I am Sam text
geom_label(aes(x= 1.99, y = 150), label = "2001 drama about a\ndisabled man's fight for\ncustody of daughter", size = 3, hjust = 0, vjust = 0.5, show.legend = FALSE, fill = "lightblue") +
geom_label(aes(x= 1.99, y = 134), label = 'Described as "powerful",\n"heartwarming" by viewers' , size = 3, hjust = 0, vjust = 0.5, show.legend = FALSE, fill = "lightblue") +
geom_label(aes(x= 1.99, y = 118), label = 'Critics say "contrived",\n"insensitive", "shamelessly\nsentimental"' , size = 3, hjust = 0, vjust = 0.5, show.legend = FALSE, fill = "lightblue") +
# remove x-axis label
xlab("") +
# display movie images
geom_image(
aes(image = boyhood), x = 1.27, y =135, size = 0.3, by = "height") +
geom_image(
aes(image = iamsam), x = 1.73, y =135, size = 0.3, by = "height")
print(movie_bar_chart)
# ggsave("/Users/taylorparchment/Desktop/imdb_chart.png", movie_bar_chart, width = 6, height = 7, bg = "white")